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Abstract 

Formulaic sequences are found to be processed faster than their matched novel phrases in previous studies. 
Given the variety of formulaic types, few studies have compared processing on different types of formulaic 
sequences. The present study explored the processing among idioms, speech formulae and written formulae. It 
has been found that in addition to the processing advantage of formulaic sequences as compared to the 
nonformulaic phrases, frequent written formulaic sequences were processed faster than the infrequent idioms. 
The results suggested that when processing advantage was concerned, both the holistic storage view and the 
frequency effect need to be considered. 
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1. Introduction 

Formulaic sequences, such as idioms, speech formulae, collocations and binominal expressions, are widely used 
in native-like communication (Oppenheim, 2000; Foster, 2001; Erman & Warren, 2000). Since Pawley and 
Syder (1983)’s classic work concerning the role of formulaic sequences in native speakers’ speech, the recent 
thirty-two years have witnessed rapid progress in the field. Annual Review of Applied Linguistics (ARAL) has 
contributed its 2012 edition to the research on formulaic language, which covers different aspects of the field. 
Among them, the processing of formulaic sequences has become one of the increasingly interesting topics. 

2. Literature Review 

Conklin and Schmitt (2012) have categorized research on the processing of formulaic sequences mainly into two 
aspects: the processing of idioms and of nonidiomatic formulaic sequences. Idioms, as a type of non-transparent 
language expression, have been considered as the prototype of formulaic language (Nekrasova, 2009). 
Specifically, researchers focused on two respects of idiom processing: the literal vs. figurative interpretations of 
idioms and the processing of idioms vs. novel phrases. 

As for the competition between literal and figurative meanings of idioms, researchers proposed different models 
to explain the phenomenon. For example, Lexical Representation Hypothesis proposed by Swinney and Cutler 
(1979) argues that speakers initiated the compilation of the literal meaning and the activation of the figurative 
meaning almost at the same time. Because idioms are stored like morphologically complex words, the figurative 
meaning is first activated. On the other hand, the Idiom Decomposition Hypothesis (Gibbs, Nayak, & Cutting, 
1989) suggests that whether an idiom is decomposable or not will decide the way of idiom processing. The 
decomposable idioms are analyzed linguistically, and the idiom meaning is consistent with the analysis result. 
Processing time can be saved as a result of such consistency. Hence, decomposable idioms enjoy a processing 
advantage. Differently, Configuration Hypothesis (Cacciari & Tabossi, 1988) proposes that at the very beginning, 
for an idiom, both the component words and their literal meaning are activated. As the discourse information 
accumulates, the idiom will be identified as a fixed item. At this time, the figurative meaning is retrieved. 

In addition to the above models, researchers are concerned with the processing of idioms in certain experimental 
settings. Tabossi, Fanari and Wolf (2009) showed that both decomposable and non-decomposable idioms are 
reacted more quickly than matched literal phrases by native speakers. Underwood, Schmitt, and Galpin (2004) 
investigated the processing of idioms embedded in a reading text. It was revealed that native speakers fixed their 
eyes less (and with a shorter duration) on the terminal words of idioms than on the nonformulaic words. In a 
recent study, Siyanova-Chanturia, Conklin, and Schmitt (2011) revealed that the idioms are processed 
significantly faster than the nonformulaic phrases by native speakers. 
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In brief, the previous studies consistently showed the speed advantage for idioms as compared to novel phrases. 
However, there are some problems for choosing idioms as the research focus in the processing research (Conklin 
& Schmitt, 2012): firstly, some of the idioms may be unfamiliar to nonnative speakers and LI children, which 
may be an intervening factor in certain studies; secondly, the figurative and literal meanings of idioms may bring 
about certain ambiguity in processing; thirdly, idioms with different degrees of transparency will influence the 
processing of idioms. Hence, Conklin and Schmitt (2012) suggested non-idiomatic formulaic sequences be a 
better test case. 

Some researchers turned the focus to the comparison between the non-idiomatic formulaic sequences and the 
novel matched ones. A recent eye-tracking study by Siyanova-Chanturia, Conklin and Van Heuven (2011) 
investigated processing of formulaic sequences with different phrasal frequency. It has been revealed that first, 
frequent formulaic sequences are processed faster than less frequent ones; second, regardless of frequency, 
native speakers processed the entrenched binomials significantly faster than reversed forms. Tremblay and 
Baayen (2010) used behavioral and electorphysiological measures to explore the processing of the phrase in the 
middle of They found a frequency effect for this four-word expression. Although the evidence is somewhat 
incomplete, the above findings suggest that frequent formulaic sequences may be processed differently from less 
frequent ones by native speakers. 

To summarize, as far as the processing of formulaic sequences is concerned, idioms and non-idiomatic formulaic 
sequences are separately researched. It is not clear that whether the processing advantages are similarly enjoyed 
by both idioms and non-idiomatic formulaic sequences, and whether the processing advantages for both types of 
formulaic sequences arise from the same effect. Few studies have ever compared the processing differences 
among varied types of formulaic sequences. In the present study, first, we tend to confirm the previous finding 
that formulaic sequences are processed faster than the matched novel phrases; second, we are going to explore 
whether different types of formulaic sequences (i.e. idioms, speech formulas and written formulas) are processed 
differently by native speakers. 

3. The Present Study 

3.1 Research Questions 

a) Do native speakers process formulaic sequences and the matched novel phrases in different ways? 

b) Do native speakers process different types of formula sequences in different ways? 

3.2 Method 

3.2.1 Participants 

There were 20 English native speakers who participated in the study. All participants were students at a British 
University (8 graduate and 12 undergraduate students), among them 12 females and 8 males. Their ages ranged 
from 18 to 29. 

3.2.2 Research Design 

The research material (see Appendix) can be divided into three parts: a) three types of English formulaic 
sequences, namely idioms, speech formulaic sequences and formulaic sequences in academic writing (i.e. written 
formulaic sequences) (10 for each type). They were chosen from some corpus-based studies (Biber et al 1999; 
Nattinger & DeCarrico, 1992); b) for each formulaic sequence, a matched novel phrase was constructed. We 
replaced one or two words in a formulaic sequence with another one with similar length (in terms of number of 
syllables) and word frequency. For example, for the idiom hear in mind , the first word hear was replaced by 
another word hold to form a non-formulaic sequence hold in mind. We have made an effort to make sure that the 
number of syllables of the replacement items was equal to or smaller than that of the formulaic sequence, and the 
words used as the replacement words were matched in frequency on the basis of BNC frequency list (Leech et al, 
2001). There are 30 matched novel phrases in the study; c) 15 ungrammatical sequences were constructed which 
were used as the distracters in the study. 

From the 75 items, we have created two counterbalanced material lists, each of which included 15 formulas (5 
for each formula type), 15 controlled novel phrases, and 15 ungrammatical sequences (see Appendix). 

The study had a 3*2*2 design, with the formula types (idiom vs. speech formulaic sequences vs. written 
formulaic sequences), and the item grammaticality (grammatical vs. ungrammatical) and formulaicity (formulaic 
vs. nonformulaic). All the variables are the within-subject variables. 
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3.2.3 Procedures 

In this study, the items were presented on the computer screen one by one (with 5s interval) in a random order. 
The participants were required to judge whether or not the items are grammatical. To respond, they pressed the 
key “q” for grammatical ones, and “p” for ungrammatical ones (Note 1). 

Participants’ reaction time and error rate were collected for data analysis. For item presentation and data 
collection, we used the computer program “Psychopy” developed by Peirce (2007) at Nottingham University. 
Each participant was randomly assigned to take either of the two test sets individually. Prior to the test, they read 
the written instructions and were given a training session for 20 practice items. 

3.2.4 Results and Analysis 

The reaction time and the error rate for all types of sequences were calculated for analysis. The descriptive data 
of the reaction time and error rate are showed in table 1. For the within-subject analysis, GLM-Repeated 
Measures procedures of SPSS were used. 


Table 1. Native speakers’ mean reaction time (in milliseconds) and error rate (in percentage) on different types of 
sequences 



Reaction time 

Mean (SD) 

Error rate (percentage) 

Mean (SD) 

Formulaic sequences 

Idiom 

946 (257) 

2 (6.32) 

Speech 

897 (207) 

1 (3.16) 

Written 

865 (243) 

1 (3.16) 

Non-formulaic 

Idiom replacement 

1209(434) 

18 (17.5) 

Speech replacement 

1103 (368) 

12(14.76) 

Written replacement 

1140 (447) 

7 (10.29) 

Ungrammatical 

1179(400) 

15.3 (20.31) 


GLM-Repeated Measures analysis on reaction time showed that there was significantly different reaction time 
concerning different types of formulaic sequences, Fl(6, 2.26)=23.83, p=.000 (p<.05), partial eta squared 
(rf)=.412. Specifically, the reaction time on idioms (946ms) was significantly longer than that on the written 
formulaic sequences (865ms), p=.014, and there was no significant difference between the reaction time on 
idioms (946ms) and on speech formulaic sequences (897ms) (p=.078), and between the reaction time on speech 
formulaic sequences (897ms) and on written formulaic sequences (865ms) (p=.226). When formulaic sequences 
and novel phrases were compared, the reaction time on each type of formulaic sequences was significantly 
shorter than that on each type of the matched novel phrases (p=.000). As for the comparison between the 
ungrammatical sequences and the other types of sequences, the reaction time on the ungrammatical sequences 
(1179ms) was significantly longer than those on the three types of formulaic sequences (946, 897, 865 for idioms, 
speech formulaic sequences and written formulaic sequences) (p=.000), and significantly longer than the reaction 
time on novel speech phrases (1103ms) (p=.011). 

GLM-Repeated Measures analysis on error rate revealed that there were significantly different error rates among 
different types of sequences, F2(6, 4.59)=2.58, p=.028 (p<.05) partial eta squared (rf)=223. Specifically, we can 
find some patterns when the error rate of the idiom replacement was compared with that of the other types of 
sequences. The error rate of the idiom replacement (n=18) was significantly higher than that of the idiom 
judgment (n=2) (p=.022), of the speech formulaic sequences judgment (n=l) (p=.019), of the written formula 
judgment (n=l) (p=,019), and of the written formula replacement judgment (n=7) (p=. 04). It can be suggested 
that native speakers produced more errors when they were judging the sequences which were constructed based 
on the idioms. 
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4. Discussion 

The present study aimed to explore whether native speakers process different types of formulaic 
sequences—idioms, speech and written formulaic sequences in varied ways. The results showed that 

i) Native speakers processed the formulaic sequences significantly faster than they processed the matched 
novel phrases. 

ii) Native speakers produced significantly more errors when they were judging the matched novel phrases 
as compared to the formulaic sequences. 

iii) When processing different types of formulaic sequences, native speakers processed idioms by using the 
longest time, and processed written formulaic sequences fastest among the three types. 

As for Research Question 1, the first finding supports the previous argument that formulaic sequences enjoy 
processing advantage as compared to their matched novel phrases. In this study, the frequency and length of 
formulaic sequences and their matched novel phrases have been controlled to match each other. As such, the 
reaction time difference cannot be significantly influenced by these factors. The processing advantage can be 
explained by Heteromorphic Distributed Lexicon (HDL) proposed by Wray (2002). According to HDL, mental 
lexicon is made up of five lexicons serving different functions. Specifically, they are a) lexicons serving a 
grammatical role in the production of novel sentences; b) referential expressions, including mono- and 
polymorphic words, and word strings, such as idioms; c) context-dependent words and expressions that show 
little creativity and are used mainly for communication; d) memorized texts, and e) expressions served as 
automatic responses to different types of stimuli (i.e. external or psychological). Three forms of units are stored 
in the lexicon: morphemes, words, and word strings. In HDL, we can see that lexical units can be represented in 
different forms. For example, bear in mind is stored holistically in the lexicon as a word string, whereas bear, in 
and mind can be stored in the referential lexicon separately as three individual words. As an idiom, for most 
cases, bear in mind are retrieved and processed from the mental lexicon holistically, and only when in a certain 
biased context which need structural analysis is it fully analyzed. 

In the grammaticality judgment test of this study, for the novel items, participants were supposed to analyze the 
syntactic structure since they do not have these representations as word strings in the mental lexicon. Therefore, 
the syntactic analyzing process needs a relatively longer time. However, for the formulaic sequences, they are 
assumed to be stored as a word string (e.g. idioms), or some context-dependent expressions in communication 
(e.g. written or speech formulaic sequences) or some automatic responses to stimuli (e.g. speech formulaic 
sequences). They can be retrieved holistically from the mental lexicon, which brings about a shorter reaction 
time in the judgment. 

Similarly, the second finding of the study, that is formulaic sequences were judged at a higher accuracy rate than 
the non-formulaic sequences, can be explained by HDL as well. For formulaic sequences, they are stored in the 
mental lexicon holistically, although their elements are stored separately as well. When the grammaticality 
judgment is on the formulaic sequences, participants just need to match the presented items with the items in the 
mental lexicon, and no syntactic analysis has been involved. The existing lexicon and the direct matching 
guarantee the accuracy rate of the judgment. However, for the non-formulaic novel phrases, participants failed in 
matching them with the holistically-stored lexicon. They had to turn to the analysis of the grammaticality of the 
structure, which does not only take additional time of processing, but risk making some judgment errors in a 
limited time length. 

For research question 2, the comparison is among the different types of formulaic sequences. In the literature, 
few studies have been carried out to compare the idioms with other types of formulaic sequences. In this study, 
idioms are found to be processed with significantly longer reaction time than written formulaic sequences. This 
finding cannot be explained by HDL. According to HDL, formulaic sequences, although in different types, are 
assumed to be stored holistically in the mental lexicon. As such, participants are supposed to process them in a 
similar way which results to similar reaction time for different types of items. This is incompatible with the third 
finding of the study. 

As for the thrid finding, usage-based models can account for the differences. Usage-based models view language 
as a statistical accumulation of experience that develops and changes when more and more utterances are 
encountered (Goldberg, 2006; Tomasello, 2003). This view predicts that it is highly probable that all frequently 
exposed or frequently used units, words and phrases will be processed faster than less frequent ones. In the 
present study, the frequencies of the formulaic sequences and of their matched novel phrases have been 
controlled. However, we did not control the relative frequencies among the three types of formulaic sequences. 
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Since in this study, the reaction time for written formulaic sequences is significantly shorter than that of idioms, 
we did a post hoc analysis on their frequency differences. We compared the frequency lists of both types, and 
found that the average frequency of written formulaic sequences (average frequency in BNC= 4667) are 
significantly higher than that of the idioms (average frequency in BNC=296) (p=.000). Hence, it may be 
suggested that in addition to the holistic storage view, frequency effect functions as well. Those formulaic 
sequences which are frequently exposed or used may enjoy a processing advantage than the infrequent formulaic 
sequences, although both of them are stored holistically in the mental lexicon. 

To conclude, this study supports the argument that formulaic sequences are processed faster than the matched 
novel phrases. In addition, infrequent idioms are processed with longer time than the frequent written formulaic 
sequences, although both of them are under the category of formulaic sequences. Heteromorphic Distributed 
Lexicon Model can account for the first finding; while usage-based frequency effect explains the second one. In 
summary, it may be suggested that when the processing advantage of formulaic sequences is concerned, the 
holistic storage view and the frequency effect need to be considered for such a combined effect. 
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Appendix: Test Materials 



Formulaic sequences 

Matched nonformulaic items 


Ungrammatical 

items 

Idiomatic 

a piece of cake 

a plate of cake 

a come surprise as 

as a matter of fact 

as a place 

boy in went 

for the time being 

for the well-being 

computer red that 

in a nutshell 

in the shell 

bus the on 

up to date 

up to ten 

late usual as 

bear in mind 

hold in mind 

chance a stand 

come as a surprise 

come to a party 

at paper the look 

keep an eye on 

keep him awake 

way the same in 

miss the boat 

take a boat 

test the in 

stand a chance 

stand a test 

on conclusion 




than bigger far 

Spoken 

for the most part 

for the best time 


as longer as 


as you know 

as you sleep 


context in of 


as I was saying 

as he was doing 


car to goes 


guess what 

guess it 


time over a long 


something like that 

a thing like that 




what happens is 

what helps when 




by and large 

big and large 




you know 

you tell 




by the way 

by the route 




and how about 

and how might 







Written 

as a result 

as a father 




in addition 

in tradition 




in other words 

in other plays 




for example 

for a few 




the point is 

the pay is 




in conclusion 

in a feeling 




in the case of 

in the tree 




to tell the truth 

to tell the price 




in the same way 

in the same file 




it depends on 

it sits on 
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